[GLUTEN-4889][VL] feat: Support approx_percentile aggregate function#11651
[GLUTEN-4889][VL] feat: Support approx_percentile aggregate function#11651Yizhou-Yang wants to merge 17 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Please update get-velox.sh to test your PR, then you can verify if both can work well, you may update this line |
|
Do we need the config? Usually we offload the function to native by default |
added the 16320 and removed the config |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
61dc2ca to
c76cb84
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
5 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Please update the PR description to describe the KLL Sketch is different so that we handle fallback separately. |
done~ |
1b402de to
b0679ad
Compare
|
Run Gluten Clickhouse CI on x86 |
jinchengchenghh
left a comment
There was a problem hiding this comment.
Most is good, after we resolve CH CI, we can merge it.
|
|
||
| const std::unordered_set<std::string> kBlackList = | ||
| {"split_part", "sequence", "approx_percentile", "map_from_arrays"}; | ||
| {"split_part", "sequence", "map_from_arrays"}; |
There was a problem hiding this comment.
I find also the map_from_arrays in black list, another topic, would you like to implement it in Gluten?
There was a problem hiding this comment.
sure, I will look into it. @jinchengchenghh
facebookincubator/velox#13183
I think that the velox part is compatible with spark as of 2025/05, it's just for some reason the velox fix is not adopted in gluten. Right now I think the only gluten fix needed is to remove the kblacklist and test it...
I'll do a comparison with spark, post the results and add appropriate ut.
|
Run Gluten Clickhouse CI on x86 |
…rt VeloxApproximatePercentile to DeclarativeAggregate with KLL sketch
…ckend ApproximatePercentile registration (Velox PR only)
62067d6 to
d14b740
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
…ministic across Spark and Velox implementations
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
There are test failures in x86 environment non-slow modes. edit: |
|
@Yizhou-Yang Thanks for your implementation! Recently I am cherrying pick your PR and I find that the KLL implementation in Gluten has only one level and discards items in odd position when compacting. I am wondering if this implementation can meet the |
What
Add Velox
approx_percentilesupport for Spark.Why
Velox uses KLL sketch while Spark uses GK algorithm — their intermediate data formats are incompatible (KLL: 9-field StructType vs GK: single BinaryType buffer). This means fallback between Velox and Spark requires separate handling.
How
VeloxApproximatePercentile: ADeclarativeAggregatewith 9aggBufferAttributesmatching Velox's KLL sketch layout.KllSketchHelper/KllSketchAdd/KllSketchMerge/KllSketchEval): Simplified KLL operations for fallback, binary-compatible with Velox's C++ accumulator.ApproxPercentileRewriteRule: Rewrites Spark'sApproximatePercentileto the Velox-compatible version.Key decisions
IntegerType(Spark's original value); Velox computesepsilon = 1.0/accuracyinternally.Velox dependency
facebookincubator/velox#16320
Related issue: #4889